{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 07 菜品分类-数据分组/数据透视表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from datetime import datetime\n",
    "# 一个cell输出多行语句\n",
    "from IPython.core.interactiveshell import InteractiveShell\n",
    "InteractiveShell.ast_node_interactivity = \"all\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 一、数据分组\n",
    "根据一个或多个键（函数、数组或df列名）将数据分成**若干组**，然后对分组后的数据分别进行**汇总计算**，并将汇总后的结果进行**合并**。\n",
    "\n",
    "被用作汇总计算的函数称为聚合函数。\n",
    "\n",
    "数据分组流程如下：\n",
    "\n",
    "<img src='./image/groupby_flow.jpg' width='60%'>\n",
    "\n",
    "Excel实现：前提要先对分组的那一列（键）进行排序（升序或降序）\n",
    "\n",
    "<img src='./image/groupby_excel.jpg' width='50%'>\n",
    "\n",
    "选中分组区域，单击**数据**-》**分类汇总**\n",
    "\n",
    "<img src='./image/groupby_excel2.jpg' width='60%'>\n",
    "\n",
    "Excel常见的分类汇总方式：\n",
    "\n",
    "<img src='./image/groupby_excel3.jpg' width='60%'>\n",
    "\n",
    "Python实现：**groupby()**方法\n",
    "\n",
    "### 1.1 分组键是列名\n",
    "\n",
    "#### 1.1.1 按照一列进行分组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>用户ID</th>\n",
       "      <th>客户分类</th>\n",
       "      <th>区域</th>\n",
       "      <th>是否省会</th>\n",
       "      <th>7月销量</th>\n",
       "      <th>8月销量</th>\n",
       "      <th>9月销量</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>59224</td>\n",
       "      <td>A类</td>\n",
       "      <td>一线城市</td>\n",
       "      <td>是</td>\n",
       "      <td>6</td>\n",
       "      <td>20</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>55295</td>\n",
       "      <td>B类</td>\n",
       "      <td>三线城市</td>\n",
       "      <td>否</td>\n",
       "      <td>37</td>\n",
       "      <td>27</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>46035</td>\n",
       "      <td>A类</td>\n",
       "      <td>二线城市</td>\n",
       "      <td>是</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>2459</td>\n",
       "      <td>C类</td>\n",
       "      <td>一线城市</td>\n",
       "      <td>是</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>22179</td>\n",
       "      <td>B类</td>\n",
       "      <td>三线城市</td>\n",
       "      <td>否</td>\n",
       "      <td>9</td>\n",
       "      <td>12</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>22557</td>\n",
       "      <td>A类</td>\n",
       "      <td>二线城市</td>\n",
       "      <td>是</td>\n",
       "      <td>42</td>\n",
       "      <td>20</td>\n",
       "      <td>55</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    用户ID 客户分类    区域 是否省会  7月销量  8月销量  9月销量\n",
       "0  59224   A类  一线城市    是     6    20     0\n",
       "1  55295   B类  三线城市    否    37    27    35\n",
       "2  46035   A类  二线城市    是     8     1     8\n",
       "3   2459   C类  一线城市    是     7     8    14\n",
       "4  22179   B类  三线城市    否     9    12     4\n",
       "5  22557   A类  二线城市    是    42    20    55"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./data/train-pivot.csv', encoding='gbk')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x1185022d0>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>用户ID</th>\n",
       "      <th>区域</th>\n",
       "      <th>是否省会</th>\n",
       "      <th>7月销量</th>\n",
       "      <th>8月销量</th>\n",
       "      <th>9月销量</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>客户分类</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>A类</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>B类</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>C类</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      用户ID  区域  是否省会  7月销量  8月销量  9月销量\n",
       "客户分类                                  \n",
       "A类       3   3     3     3     3     3\n",
       "B类       2   2     2     2     2     2\n",
       "C类       1   1     1     1     1     1"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>用户ID</th>\n",
       "      <th>7月销量</th>\n",
       "      <th>8月销量</th>\n",
       "      <th>9月销量</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>客户分类</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>A类</td>\n",
       "      <td>127816</td>\n",
       "      <td>56</td>\n",
       "      <td>41</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>B类</td>\n",
       "      <td>77474</td>\n",
       "      <td>46</td>\n",
       "      <td>39</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>C类</td>\n",
       "      <td>2459</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        用户ID  7月销量  8月销量  9月销量\n",
       "客户分类                          \n",
       "A类    127816    56    41    63\n",
       "B类     77474    46    39    39\n",
       "C类      2459     7     8    14"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('客户分类')\n",
    "# 返回的不是DataFrame对象，而是DataFrameGroupBy对象\n",
    "# 里面包含着分组以后的若干组数据，但是没有直接显示出来，需要进行汇总计算后才会显示\n",
    "\n",
    "# 对分组进行计数运算\n",
    "# 如果对分组进行数值运算，只有数据类型是数值（int，float）的列才会参与运算\n",
    "df.groupby('客户分类').count()\n",
    "\n",
    "# 求和运算\n",
    "df.groupby('客户分类').sum()\n",
    "\n",
    "# 这些函数称为聚合函数，《开始烹调-数据运算》中汇总运算都适用"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.1.2 按照多列进行分组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>用户ID</th>\n",
       "      <th>是否省会</th>\n",
       "      <th>7月销量</th>\n",
       "      <th>8月销量</th>\n",
       "      <th>9月销量</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>客户分类</th>\n",
       "      <th>区域</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td rowspan=\"2\" valign=\"top\">A类</td>\n",
       "      <td>一线城市</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>二线城市</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>B类</td>\n",
       "      <td>三线城市</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>C类</td>\n",
       "      <td>一线城市</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           用户ID  是否省会  7月销量  8月销量  9月销量\n",
       "客户分类 区域                                \n",
       "A类   一线城市     1     1     1     1     1\n",
       "     二线城市     2     2     2     2     2\n",
       "B类   三线城市     2     2     2     2     2\n",
       "C类   一线城市     1     1     1     1     1"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>用户ID</th>\n",
       "      <th>7月销量</th>\n",
       "      <th>8月销量</th>\n",
       "      <th>9月销量</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>客户分类</th>\n",
       "      <th>区域</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td rowspan=\"2\" valign=\"top\">A类</td>\n",
       "      <td>一线城市</td>\n",
       "      <td>59224</td>\n",
       "      <td>6</td>\n",
       "      <td>20</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>二线城市</td>\n",
       "      <td>68592</td>\n",
       "      <td>50</td>\n",
       "      <td>21</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>B类</td>\n",
       "      <td>三线城市</td>\n",
       "      <td>77474</td>\n",
       "      <td>46</td>\n",
       "      <td>39</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>C类</td>\n",
       "      <td>一线城市</td>\n",
       "      <td>2459</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            用户ID  7月销量  8月销量  9月销量\n",
       "客户分类 区域                           \n",
       "A类   一线城市  59224     6    20     0\n",
       "     二线城市  68592    50    21    63\n",
       "B类   三线城市  77474    46    39    39\n",
       "C类   一线城市   2459     7     8    14"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "客户分类\n",
       "A类    3\n",
       "B类    2\n",
       "C类    1\n",
       "Name: 用户ID, dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(['客户分类', '区域']).count()\n",
    "df.groupby(['客户分类', '区域']).sum()\n",
    "\n",
    "# 有时候不需要对所有的列进行汇总运算，方法：将想要计算的列通过索引方式取出来，再在该列上进行汇总\n",
    "df.groupby('客户分类')['用户ID'].count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 分组键是Series\n",
    "DataFrame取出一列就是Series，比如df['客户分类']就是一个Series\n",
    "\n",
    "#### 1.2.1 按照一个Series进行分组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>用户ID</th>\n",
       "      <th>区域</th>\n",
       "      <th>是否省会</th>\n",
       "      <th>7月销量</th>\n",
       "      <th>8月销量</th>\n",
       "      <th>9月销量</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>客户分类</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>A类</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>B类</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>C类</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      用户ID  区域  是否省会  7月销量  8月销量  9月销量\n",
       "客户分类                                  \n",
       "A类       3   3     3     3     3     3\n",
       "B类       2   2     2     2     2     2\n",
       "C类       1   1     1     1     1     1"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(df['客户分类']).count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>用户ID</th>\n",
       "      <th>7月销量</th>\n",
       "      <th>8月销量</th>\n",
       "      <th>9月销量</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>客户分类</th>\n",
       "      <th>区域</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td rowspan=\"2\" valign=\"top\">A类</td>\n",
       "      <td>一线城市</td>\n",
       "      <td>59224</td>\n",
       "      <td>6</td>\n",
       "      <td>20</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>二线城市</td>\n",
       "      <td>68592</td>\n",
       "      <td>50</td>\n",
       "      <td>21</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>B类</td>\n",
       "      <td>三线城市</td>\n",
       "      <td>77474</td>\n",
       "      <td>46</td>\n",
       "      <td>39</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>C类</td>\n",
       "      <td>一线城市</td>\n",
       "      <td>2459</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            用户ID  7月销量  8月销量  9月销量\n",
       "客户分类 区域                           \n",
       "A类   一线城市  59224     6    20     0\n",
       "     二线城市  68592    50    21    63\n",
       "B类   三线城市  77474    46    39    39\n",
       "C类   一线城市   2459     7     8    14"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([df['客户分类'], df['区域']]).sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "客户分类\n",
       "A类    3\n",
       "B类    2\n",
       "C类    1\n",
       "Name: 用户ID, dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(df['客户分类'])['用户ID'].count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 神奇的aggregate方法\n",
    "前面用到的聚合函数都是在DataFrameGroupBy上面调用的，分组后所有的列都做同一种运算，一次只能使用一种汇总方式\n",
    "\n",
    "aggregate一次可以使用多种方式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">用户ID</th>\n",
       "      <th colspan=\"2\" halign=\"left\">区域</th>\n",
       "      <th colspan=\"2\" halign=\"left\">是否省会</th>\n",
       "      <th colspan=\"2\" halign=\"left\">7月销量</th>\n",
       "      <th colspan=\"2\" halign=\"left\">8月销量</th>\n",
       "      <th colspan=\"2\" halign=\"left\">9月销量</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>sum</th>\n",
       "      <th>count</th>\n",
       "      <th>sum</th>\n",
       "      <th>count</th>\n",
       "      <th>sum</th>\n",
       "      <th>count</th>\n",
       "      <th>sum</th>\n",
       "      <th>count</th>\n",
       "      <th>sum</th>\n",
       "      <th>count</th>\n",
       "      <th>sum</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>客户分类</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>A类</td>\n",
       "      <td>3</td>\n",
       "      <td>127816</td>\n",
       "      <td>3</td>\n",
       "      <td>一线城市二线城市二线城市</td>\n",
       "      <td>3</td>\n",
       "      <td>是是是</td>\n",
       "      <td>3</td>\n",
       "      <td>56</td>\n",
       "      <td>3</td>\n",
       "      <td>41</td>\n",
       "      <td>3</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>B类</td>\n",
       "      <td>2</td>\n",
       "      <td>77474</td>\n",
       "      <td>2</td>\n",
       "      <td>三线城市三线城市</td>\n",
       "      <td>2</td>\n",
       "      <td>否否</td>\n",
       "      <td>2</td>\n",
       "      <td>46</td>\n",
       "      <td>2</td>\n",
       "      <td>39</td>\n",
       "      <td>2</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>C类</td>\n",
       "      <td>1</td>\n",
       "      <td>2459</td>\n",
       "      <td>1</td>\n",
       "      <td>一线城市</td>\n",
       "      <td>1</td>\n",
       "      <td>是</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      用户ID            区域                是否省会       7月销量      8月销量      9月销量  \\\n",
       "     count     sum count           sum count  sum count sum count sum count   \n",
       "客户分类                                                                          \n",
       "A类       3  127816     3  一线城市二线城市二线城市     3  是是是     3  56     3  41     3   \n",
       "B类       2   77474     2      三线城市三线城市     2   否否     2  46     2  39     2   \n",
       "C类       1    2459     1          一线城市     1    是     1   7     1   8     1   \n",
       "\n",
       "          \n",
       "     sum  \n",
       "客户分类      \n",
       "A类    63  \n",
       "B类    39  \n",
       "C类    14  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>用户ID</th>\n",
       "      <th>7月销量</th>\n",
       "      <th>8月销量</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>客户分类</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>A类</td>\n",
       "      <td>3</td>\n",
       "      <td>56</td>\n",
       "      <td>41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>B类</td>\n",
       "      <td>2</td>\n",
       "      <td>46</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>C类</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      用户ID  7月销量  8月销量\n",
       "客户分类                  \n",
       "A类       3    56    41\n",
       "B类       2    46    39\n",
       "C类       1     7     8"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 先做计数，再做求和运算\n",
    "df.groupby('客户分类').aggregate(['count', 'sum'])\n",
    "# 对用户ID进行计数，对7、8月销量进行求和\n",
    "df.groupby('客户分类').aggregate({'用户ID': 'count', '7月销量':'sum', '8月销量': 'sum'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 对分组后的结果重置索引\n",
    "DataFrameGroupBy对象经过汇总运算以后的形式并不是标准的DataFrame形式\n",
    "\n",
    "通过**reset_index()**方法将非标准转化为标准DataFrame形式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>用户ID</th>\n",
       "      <th>7月销量</th>\n",
       "      <th>8月销量</th>\n",
       "      <th>9月销量</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>客户分类</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>A类</td>\n",
       "      <td>127816</td>\n",
       "      <td>56</td>\n",
       "      <td>41</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>B类</td>\n",
       "      <td>77474</td>\n",
       "      <td>46</td>\n",
       "      <td>39</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>C类</td>\n",
       "      <td>2459</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        用户ID  7月销量  8月销量  9月销量\n",
       "客户分类                          \n",
       "A类    127816    56    41    63\n",
       "B类     77474    46    39    39\n",
       "C类      2459     7     8    14"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>客户分类</th>\n",
       "      <th>用户ID</th>\n",
       "      <th>7月销量</th>\n",
       "      <th>8月销量</th>\n",
       "      <th>9月销量</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>A类</td>\n",
       "      <td>127816</td>\n",
       "      <td>56</td>\n",
       "      <td>41</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>B类</td>\n",
       "      <td>77474</td>\n",
       "      <td>46</td>\n",
       "      <td>39</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>C类</td>\n",
       "      <td>2459</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  客户分类    用户ID  7月销量  8月销量  9月销量\n",
       "0   A类  127816    56    41    63\n",
       "1   B类   77474    46    39    39\n",
       "2   C类    2459     7     8    14"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('客户分类').sum()\n",
    "df.groupby('客户分类').sum().reset_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 二、数据透视表\n",
    "数据透视表和数据分组类似但又不同，数据分组在**一维（行）方向**上不断拆分，而数据透视表是在**行、列方向**上同时拆分\n",
    "\n",
    "Excel实现：\n",
    "\n",
    "<img src='./image/pivot_excel.jpg' />\n",
    "\n",
    "Python实现：\n",
    "\n",
    "pd.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All')\n",
    "\n",
    "* data：表示要做数据透视表的整个表\n",
    "* values：对应Excel中值那个框\n",
    "* index：对应Excel中行那个框\n",
    "* columns：对应Excel列那个框\n",
    "* aggfunc：表示对values的计算类型\n",
    "* fill_value：表示对空值的填充值\n",
    "* margins：表示是否显示合计列\n",
    "* dropna：表示是否删除缺失，如果为真时，则把一整行全作为缺失值删除\n",
    "* margins_name：表示合计列的列名"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>区域</th>\n",
       "      <th>一线城市</th>\n",
       "      <th>三线城市</th>\n",
       "      <th>二线城市</th>\n",
       "      <th>总计</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>客户分类</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>A类</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>B类</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>C类</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>总计</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "区域    一线城市  三线城市  二线城市  总计\n",
       "客户分类                      \n",
       "A类       1     0     2   3\n",
       "B类       0     2     0   2\n",
       "C类       1     0     0   1\n",
       "总计       2     2     2   6"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 客户分类作为index，区域作为columns，用户ID作为values，对values执行count运算\n",
    "pd.pivot_table(df, values='用户ID', columns='区域', index='客户分类', \n",
    "               aggfunc='count', margins=True, margins_name='总计',\n",
    "               fill_value=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">7月销量</th>\n",
       "      <th colspan=\"3\" halign=\"left\">用户ID</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>区域</th>\n",
       "      <th>一线城市</th>\n",
       "      <th>三线城市</th>\n",
       "      <th>二线城市</th>\n",
       "      <th>一线城市</th>\n",
       "      <th>三线城市</th>\n",
       "      <th>二线城市</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>客户分类</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>A类</td>\n",
       "      <td>6.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>50.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>B类</td>\n",
       "      <td>NaN</td>\n",
       "      <td>46.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>C类</td>\n",
       "      <td>7.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     7月销量             用户ID          \n",
       "区域   一线城市  三线城市  二线城市 一线城市 三线城市 二线城市\n",
       "客户分类                                \n",
       "A类    6.0   NaN  50.0  1.0  NaN  2.0\n",
       "B类    NaN  46.0   NaN  NaN  2.0  NaN\n",
       "C类    7.0   NaN   NaN  1.0  NaN  NaN"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>客户分类</th>\n",
       "      <th colspan=\"3\" halign=\"left\">7月销量</th>\n",
       "      <th colspan=\"3\" halign=\"left\">用户ID</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>区域</th>\n",
       "      <th></th>\n",
       "      <th>一线城市</th>\n",
       "      <th>三线城市</th>\n",
       "      <th>二线城市</th>\n",
       "      <th>一线城市</th>\n",
       "      <th>三线城市</th>\n",
       "      <th>二线城市</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>A类</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>50</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>B类</td>\n",
       "      <td>0</td>\n",
       "      <td>46</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>C类</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   客户分类 7月销量           用户ID          \n",
       "区域      一线城市 三线城市 二线城市 一线城市 三线城市 二线城市\n",
       "0    A类    6    0   50    1    0    2\n",
       "1    B类    0   46    0    0    2    0\n",
       "2    C类    7    0    0    1    0    0"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 对不同值进行不同的计算\n",
    "pd.pivot_table(df, values=['用户ID', '7月销量'], columns='区域',\n",
    "               index='客户分类', aggfunc={'用户ID':'count', '7月销量': 'sum'})\n",
    "pd.pivot_table(df, values=['用户ID', '7月销量'], columns='区域',\n",
    "               index='客户分类', aggfunc={'用户ID':'count', '7月销量': 'sum'}, fill_value=0).reset_index()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
